Automated design of multidimensional clustering tables for relational databases
نویسندگان
چکیده
The ability to physically cluster a database table on multiple dimensions is a powerful technique that offers significant performance benefits in many OLAP, warehousing, and decision-support systems. An industrial implementation of this technique for the DB2® Universal DatabaseTM (DB2 UDB) product, called multidimensional clustering (MDC), which co-exists with other classical forms of data storage and indexing methods, was described in VLDB 2003. This paper describes the first published model for automating the selection of clustering keys in single-dimensional and multidimensional relational databases that use a cell/block storage structure for MDC. For any significant dimensionality (3 or more), the possible solution space is combinatorially complex. The automated MDC design model is based on whatif query cost modeling, data sampling, and a search algorithm for evaluating a large constellation of possible combinations. The model is effective at trading the benefits of potential combinations of clustering keys against data sparsity and performance. It also effectively selects the granularity at which dimensions should be used for clustering (such as week of year versus month of year). We show results from experiments indicating that the model provides design recommendations of comparable quality to those made by human experts. The model has been implemented in the IBM® DB2 UDB for Linux®, UNIX® and Windows® Version 8.2 release.
منابع مشابه
A Logical Approach to Multidimensional Databases
In this paper we present MD, a logical model for OLAP systems, and show how it can be used in the design of multidimensional databases. Unlike other models for multidimensional databases, MD is independent of any speci c implementation (relational or proprietary multidimensional) and as such it provides a clear separation between practical and conceptual aspects. In this framework, we present a...
متن کاملAlgorithms for merged indexes
Merged indexes are B-trees that contain multiple traditional indexes and interleave their records based on a common sort order. In relational databases, merged indexes implement “master-detail clustering” of related records, e.g., orders and order details. Thus, merged indexes shift de-normalization from the logical level of tables and rows to the physical level of indexes and records, which is...
متن کاملOn the Scalability of Multidimensional Databases
It is commonly accepted in the practice of on-line analytical processing databases that the multidimensional database organization is less scalable than the relational one. It is easy to see that the size of the multidimensional organization may increase very quickly. For example, if we introduce one additional dimension, then the total number of possible cells will be at least doubled. However...
متن کاملMRC: Multi Relational Clustering approach
— Clustering is a process of partitioning data objects into groups based on the similarity measures. Most of the existing methods perform clustering within a single table, but most of the real-world databases, however, store information in multiple tables. We propose a new method which is called Multi Relational Clustering (MRC) for clustering a relational database. The MRC approach uses existi...
متن کاملImplementation of Multidimensional Index Structures for Knowledge Discovery in Relational Databases
Efficient query processing is one of the basic needs for data mining algorithms. Clustering algorithms, association rule mining algorithms and OLAP tools all rely on efficient query processors being able to deal with high-dimensional data. Inside such a query processor, multidimensional index structures are used as a basic technique. As the implementation of such an index structures is a diffic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004